Nearest Neighbor Search using Kd-trees

نویسنده

  • Rina Panigrahy
چکیده

We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensionality we focus on approximate solutions; a c-approximate nearest neighbor is any neighbor within distance at most c times the distance to the nearest neighbor. We show that for a randomly constructed database of points the traditional kd-tree search algorithm has a very low probability of finding an approximate nearest neighbor; the probability of success drops exponentially in the number of dimensions d as e. However, a simple change to the search algorithm results in a much higher chance of success. Instead of searching for the query point in the kd-tree the search for a random set of points in the neighborhood of the query point. It turns out that searching for e such points can find the c-approximate nearest neighbor with a much higher chance of success.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Algorithm Finding Nearest Neighbor Using Kd-trees

We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than two. Since the exact nearest neighbor search problem suffers from the curse of dimension...

متن کامل

Which Space Partitioning Tree to Use for Search - Summary

Trees like binary-space-partitioning trees, kd-trees, principal axis trees and random projection trees are used to answer the question ”which tree to use for nearest-neighbor search?.” This paper deals with the influence of the vector quantization performance of the trees on the search performance and the margins of the partitions in these trees. Theoretical results show that both factors have ...

متن کامل

RapidPolygonLookup: An R package for polygon lookup using kd trees

Coordinate level spatial data need to be frequently aggregated to higher geographical identities like census blocks, ZIP codes or police district boundaries for analysis. This process requires mapping each point in the given data set to an individual element of the desired geographical hierarchy. Unless efficient data structures are used, this can be a daunting task. The operation point.in.poly...

متن کامل

Randomly Projected KD-Trees with Distance Metric Learning for Image Retrieval

Efficient nearest neighbor (NN) search techniques for highdimensional data are crucial to content-based image retrieval (CBIR). Traditional data structures (e.g., kd-tree) usually are only efficient for low dimensional data, but often perform no better than a simple exhaustive linear search when the number of dimensions is large enough. Recently, approximate NN search techniques have been propo...

متن کامل

Probabilistic cost model for nearest neighbor search in image retrieval

We present a probabilistic cost model to analyze the performance of the kd-tree for nearest neighbor search in the context of content-based image retrieval. Our cost model measures the expected number of kd-tree nodes traversed during the search query. We show that our cost model has high correlations with both the observed number of traversed nodes and the runtime performance of search queries...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006